Nucleation of Biomolecular Condensates from Finite-Sized Simulations

The nucleation of protein condensates is a concentration-driven process of assembly. When modeled in the canonical ensemble, condensation is affected by finite-size effects. Here, we present a general and efficient route for obtaining ensemble properties of protein condensates in the macroscopic limit from finite-sized nucleation simulations. The approach is based on a theoretical description of droplet nucleation in the canonical ensemble and enables estimation of thermodynamic and kinetic parameters, such as the macroscopic equilibrium density of the dilute protein phase, the surface tension of the condensates, and nucleation free energy barriers. We apply the method to coarse-grained simulations of NDDX4 and FUS-LC, two phase-separating disordered proteins with different physicochemical characteristics. Our results show that NDDX4 condensate droplets, characterized by lower surface tension, higher solubility, and faster monomer exchange dynamics compared to those of FUS-LC, form with negligible nucleation barriers. In contrast, FUS-LC condensates form via an activated process over a wide range of concentrations.

B iomolecular compartments that are not bound by membranes have attracted a great deal of attention in the past decade because of their important role in cellular organization. 1,2 The assembly of these membraneless organelles (MLOs) is driven by the formation of dynamical multivalent interactions between proteins and/or nucleic acids, 1,2 often following a nucleation mechanism. 3,4 Notably, many proteins involved in forming such compartments are either intrinsically disordered or have highly flexible domains; typical examples include proteins from the DEADbox 5−7 and FET 8−10 families. The interaction between biomacromolecules in the formation of MLOs was described according to the stickers-and-spacers framework, derived for associative polymers. 9,11 In this framework, polymer chains are characterized by multivalent domains or motifs, named stickers, that govern intermolecular interactions, interspersed with spacer domains that influence the material properties of the condensates. 9 Experiments with simplified systems with one to a few of these disordered protein regions have shown that they can lead to the formation of assemblies via a process of liquid−liquid phase separation (LLPS). 5,6,8 Still, the phenomenon in cells could be more complex, involving different molecular mechanisms. 12−14 Molecular simulations of simplified systems have provided important insights into the mechanisms and molecular drivers for the formation of biomolecular assemblies. Explicit solvent molecular dynamics (MD) simulations offer a detailed picture of the intermolecular interactions and relationships between local structure and phase separation propensity. 15−18 Still, system sizes and time scales that can be investigated using atomistic MD are severely limited. To alleviate these difficulties, several coarse-grained (CG) models with a onebead-per-residue resolution were proposed 19−23 and successfully applied to study phase-separating systems, to establish coexistence conditions, and to examine the effect of mutations and post-translational modifications on phase separation. 19,23,24 Unfortunately, even CG simulations suffer from size limitations, and particular strategies, such as the slab method, 19 have to be adopted to minimize finite-size effects in the study of phase separation processes. Indeed, in finite-sized systems, the free energy change associated with the assembly of a condensate droplet is a function of both the concentration and the total volume of the system. 25−29 This dependence emerges from the fact that, in small volumes, 30 the chemical potential of the environment surrounding a condensate droplet depends on its size, leading to qualitative and quantitative differences when compared to its macroscopic counterpart. 26,29,31 An elegant approach to account for finite-size effects is the modified liquid droplet (MLD) model, 25,26 which provides an expression for the nucleation free energy F(n) in the canonical ensemble under the same set of assumptions typically adopted by classical nucleation theory (CNT). Expressed as a function of the density of the dilute (ρ d ) and condensed (ρ c ) phases, the MLD F(n) reads: where β = 1/kT (where k is the Boltzmann constant and T is the temperature), N is the total number of chains contained in a simulation box of volume V, n is the number of chains in the condensate droplet, σ is the planar surface tension of the condensed phase, A(n) is the surface area of a droplet of the condensed phase formed by n chains, ρ c is the equilibrium molar density of the condensed phase, ρ d * is the equilibrium molar density of the dilute phase, ρ d°i s the total protein density N/V, and ρ d n is the density of the dilute phase in a system where n chains form a condensed-phase droplet at constant N and V: Examples of F(n) profiles for systems of increasing size are represented in Figure 1E, where the stationary points correspond to the critical nucleus size n* and to the size of the self-limiting steady-state droplet n ss , the values of which both increase in magnitude as the volume and the number of molecules in the system increase. For any concentration, one can always identify a threshold volume below which condensation is inhibited by finite-size effects because F(n) becomes a monotonically increasing function of droplet size n. 25−29 Here we adopt this theoretical framework to fully account for the finite-size effects described above to characterize the thermodynamics and kinetics of biomolecular condensation in MD simulations with tractable system sizes. We demonstrate the potential of our approach by focusing on two intrinsically disordered protein domains, FUS-LC and NDDX4, which are popular model systems for investigating biomolecular condensates. 6,9 While both proteins undergo LLPS at ambient temperature, they display markedly distinct physicochemical characteristics. Notably, FUS-LC is enriched in aromatic and/ or polar residues Gln and Ser 9 whereas the NDDX4 sequence is relatively abundant in charged residues organized in patches of opposite signs, significantly contributing to its condensation behavior. 6 We model these proteins using a sequence-specific CG model with a one-bead-per-residue resolution. Consecutive amino acids are connected using a harmonic potential with an equilibrium distance of 0.38 nm and a spring constant of 1 × 10 3 kJ mol −1 nm −2 . Nonbonded amino acids interact through electrostatic interactions and short-range contact potentials. Electrostatic interactions between charged amino acids are described with a Debye−Huckel potential with a screening length of 1 nm, corresponding approximately to an ionic strength of 100 mM. Short-range nonbonded interactions are modeled using a Lennard-Jones potential with residuedependent σ and ϵ (see Table S2) inspired by the stickersand-spacers framework. 9,32 In this respect, we take advantage of mutagenesis studies that suggested a key role of amino acids with large-sized aromatic or planar side chains in driving the phase separation of our model systems. 6,9,33 Thus, we define Arg, Phe, Tyr, Trp, and Gln residues as stickers (St) and all of the others as spacers (Sp), and we set the LJ potential as ϵ St−St = 1.5ϵ St−Sp = 3ϵ Sp−Sp . The absolute energy scale of these shortrange interactions is the only free parameter of our model, and we tuned it to reproduce the experimental densities of NDDX4 and FUS-LC proteins. 9,33,34 Functional forms of the interactions and parameters of the CG model are available in the Supporting Information.
Using this CG potential, we performed a large set of NVT simulations of NDDX4 or FUS-LC to probe condensation from homogeneous solutions sampling a range of densities and system sizes in cubic boxes (see Table 1). For all of the systems, we performed simulations for 1 μs to equilibrate the systems, followed by an additional 1 μs for analyses of the resulting steady state. All of the simulations were conducted at a temperature of 300 K and an ionic strength of 100 mM in GROMACS 2019.4 (see the Supporting Information for further details). We simulated supersaturation regimes sufficient to allow condensed-phase formation within a reasonable simulation time, while avoiding extremely high concentrations that could result in large condensates spanning across simulation-box periodic boundaries.
In the majority of simulations, small droplet condensates of stationary size n ss , corresponding to local minima in Figure 1E, were observed. We identified droplets according to a twodimensional geometric criterion, considering the protein interchain contacts and distances between the centers of mass (COM) of individual chains and the droplet COM (see the Supporting Information). Analysis of protein radial number density profiles from the COM of steady-state droplets (see Figure 1F,G) indicates that densities in the core are independent of the size and overall concentration of the simulated system. The mean densities in this region provide a robust estimate of the equilibrium dense-phase densities for NDDX4 (359 mg/mL) and FUS-LC (527 mg/mL), which are in good agreement with values obtained by slab coexistence simulations (NDDX4, 336 mg/mL; FUS-LC, 484 mg/mL) and experimental results (NDDX4, 380 mg/mL; 33 FUS-LC, 477 mg/mL 34 ) (see Coexistence Simulations in the Supporting Information). Conversely, the direct estimate of dilute-phase density from finite-volume nucleation simulations can suffer from artifacts that result in errors that can scale as V 1/4 . 35 We, therefore, rely on the MLD framework that, for finitesized systems, provides the following Gibbs−Thomson/Kelvin equation for the equilibrium density of the dilute phase: 25 and r ss is the radius of the steady-state droplet. Under the assumption of spherical droplets, one can explicitly introduce the dependence of the coexistence pressure and droplet radius r ss on the number of chains in the droplet and reformulate eq 3 as This equation allows us to characterize the assembly process by determining all of the key thermodynamic quantities from our finite-size simulations. Indeed, in eq 4, N, V, and T are defined by the simulation setup while ρ c and n ss can be directly obtained from the analysis of the droplet density profiles (see panels F and G of Figure 1 and additional details reported in the Supporting Information). Most importantly, the dilutephase density (ρ d *) and the surface tension (σ) can be computed by a global fitting of eq 4 to the data from our set of simulations performed at different values of N and V. Using this strategy, we estimated ρ d * to be 1.03 ± 0.32 mg/mL for FUS and 4.87 ± 1.44 mg/mL for NDDX4 (error bars indicate 95% confidence intervals from bootstrapped results). Both densities are in excellent agreement with estimates obtained using slab coexistence simulations based on the same CG model: 1.42 ± 0.30 and 4.78 ± 0.88 mg/mL for FUS-LC and NDDX4, respectively. Both the slab and nucleation results are comparable to the experimental dilute-phase values of 2 mg/ mL 9 and 7 mg/mL. 33 Surface tension σ is estimated to be 0.37 ± 0.11 mN/m for FUS-LC and 0.101 ± 0.06 mN/m for NDDX4, reflecting the higher hydrophilic character of NDDX4 compared with that of FUS-LC. These estimates are in excellent agreement with surface tension estimates computed from slab simulations, 36   (2) a n ss is the average steady-state cluster size with the block average error in parentheses. △ indicates a fluctuating cluster (unstable primary droplet or presence of multiple droplets) or that the box sizes are too small to distinguish the dilute and dense phases. × indicates no phase separation.
the droplet shape and from the broadening of the interface between phases.
To validate the equilibrium density and surface tension parameters obtained from fitting simulation data with eq 4, we compare the position of the minima in F(n) (eq 1), parametrized with ρ d * and σ obtained from fitting, with the steady-state droplet size measured in simulations (see Figure  S5). The excellent agreement shown by the parity line demonstrates the method's power for universal calculations of thermodynamic properties, such as surface tension and Importantly, the determination of ρ d * and σ (as well as ρ c ) enables the calculation of free energy profiles in the limit of an infinitely large simulation box. This is achieved by evaluating eq 1 in the limit where N ≫ n and V nv l . In this limit, ρ d n → ρ d°, and eq 1 reduces to the typical CNT expression for the reversible work of formation of a condensate droplet from a supersaturated dilute phase at constant density ρ d°: Using this approach, we can thus predict nucleation free energy barriers and critical nucleus sizes under conditions that should be optimal for comparison with experiments without the need for computationally demanding schemes that can mimic open-boundary/infinite-reservoir macroscopic conditions. 38,39 Figure 2 demonstrates how the proposed methodology provides a direct route to extract macroscopic thermodynamic parameters from multiple finite-sized simulations. Examples of finite-size free energy profiles obtained for NDDX4 and FUS-LC are provided in Figure 2A at one of the total densities simulated. Nucleation free energies in the macroscopic limit are instead reported for all densities investigated in Figure 2B. From the nucleation free energy profiles, we can obtain nucleation barrier estimates and therefore discuss relative nucleation kinetics in the limit of a macroscopic open system. For instance, the nucleation barriers for FUS-LC are consistently higher and are associated with larger critical nuclei, cf. NDDX4 (see Figure 2B), under the conditions studied. We note, however, that using a CNT-based model for nucleation with thermodynamic parameters evaluated using our computational approach indicates a crossover in the nucleation free energy barrier (and thus in the nucleation rates) for densities lower than those explicitly simulated. This can be seen in Figure 2D, where the nucleation free energy barrier for FUS-LC becomes lower than that of NDDX4 below 10 mg/mL. The critical nucleus sizes also display a crossover, observed at higher density values of approximately 16 mg/mL. The difference in the crossover density between the free energy barrier height and critical nucleus size reflects the differences in the balance between surface and bulk free energy terms for NDDX4 and FUS-LC.
Differences in the physicochemical character of NDDX4 and FUS-LC are further reflected by their different solubility (captured by ρ d *), which induces different supersaturation levels at the same bulk densities. Approaching the binodal line (the green dashed line in the right inset of Figure 2D, corresponding to a ρ d /ρ d * of 1), both the critical nucleus size and ΔF* diverge. Under these conditions, directly observing nucleation events is extremely unlikely, even in the limit of very large simulations. 31 As such, complete information about the nucleation behavior approaching the binodal can only be inferred from theory. . Stability of (A) FUS-LC and (B) NDDX4 dense-phase droplets in confined volumes as a function of the total peptide density, predicted using parameters fitted from simulation data from Tables 1 and 2. The gray shaded region represents conditions where ρ d°< ρ d * and phase separation is thermodynamically unstable, with the dashed lines representing the predicted equilibrium vapor density ρ d *. Nucleation is thermodynamically favored above ρ d *. The white region represents a nominally supersaturated portion of parameter space; here finite-size effects mean that F(n) (represented as a solid line in the insets) is a monotonically increasing function of n, even if the corresponding free energy in the macroscopic limit (represented as a dashed line) allows for a critical nucleus. In these conditions, condensation is inhibited by confinement (see inset III). In simulations performed under these conditions (indicated with ×) nucleation is prevented by the finite size effects and simulations initialized from droplets relax to a homogeneous vapor. The blue/red shaded region instead represents the ensemble of conditions under which droplets are thermodynamically stable, characterized by the free energy profile of state II. All simulations performed in this region show phase separation. The simulations used to fit the thermodynamic parameters are represented with circles with sizes proportional to the volume of the corresponding steady-state droplets. The blue/red solid line between the white and the blue/red shaded regions is characterized by the free energy profile of boundary state II. All of the free energy profiles in the insets correspond to simulation data points with the same state labels, except for FUS II, which refers to a ρ d°s lightly higher than the actual data point at II*. Inset states of the same numbering are plotted on the same scale for FUS-LC and NDDX4.

The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter
At steady state, nucleation simulations provide an extensive sampling of the dynamic exchange of monomers between the condensed and dilute phases. We can exploit this to obtain a quantitative description of the single-chain exchange dynamics, with the aim of complementing the collective information captured by nucleation free energy profiles. For this purpose, we analyze the dynamics of single-chain exchange between the condensed and dilute phases using Markov-state models (MSMs). The details of the MSM construction and analysis are reported in the Supporting Information. Briefly, in the MSMs, the dilute and condensed states are identified on the basis of interchain contacts and the distance between the COM of the individual chains and the droplet (see Figures S7  and S8). As a measure of the exchange dynamics, we compute the rate of escape of a single chain from the condensed to the dilute phase as a function of the droplet size (see Table S1 and Figure S8). Figure 2C shows an approximately exponential decay in the condensed-to-dilute transition rate as a function of droplet size, with the decay of FUS-LC being steeper than that of NDDX4. Moreover, irrespective of the droplet size, it is slower to transfer a FUS-LC chain across the phase boundary than a NDDX4 chain. The estimated rate of dilute-tocondensed transitions appears instead largely uncorrelated with respect to the droplet size, but as expected for a diffusiondominated process, it fluctuates around the same average for both NDDX4 and FUS-LC. The ratio of the condensed-todilute and dilute-to-condensed transition rates is approximately linear with respect to the ratio of the number of peptides in the two phases (see Figure S6). The faster escape dynamics of single chains from NDDX4 condensate droplets impact fluctuations in their overall size and shape. The fluctuations of NDDX4 are generally slightly larger than those of FUS-LC (see Table 1), possibly due to a combined effect of a longer chain length, different hydrophobicity, and faster escape dynamics. In addition, while both NDDX4 and FUS-LC condensate droplets can be effectively approximated as spherical in the theoretical analysis and interpretation of the simulation results, we note that the faster exchange dynamics, higher hydrophilicity, and more gentle radial density gradients (see Figure 1F,G) of NDDX4 lead to larger deviations from a perfectly spherical shape (see Table S1).
Using the results obtained from nucleation simulations, we can rationalize the effect of finite size on the thermodynamics of phase separation by inserting σ and ρ d * into eq 1, thus mapping the qualitative nucleation behavior as a function of the total peptide density and system size. Following this strategy, we produce domain diagrams in Figure 3 that indicate the presence or absence of phase separation for both NDDX4 and FUS-LC. In the blue/red shaded region, exemplified by state I, the finite-sized thermodynamics permit the existence of steady-state droplets corresponding to local minima in the nucleation free energy profiles. Increasing the system size and density in this region results in larger droplets, as demonstrated by the size of the circles used to represent our simulation results. In the white region featuring state III, confinement induces a monotonically increasing free energy curve, 26,29 and nucleation will never occur regardless of the simulation time. The blue/red solid line represents the transition boundary, where the free energy curve has a single stationary point corresponding to a flex. This condition is closely approximated by the free energy profile of state II. Simulations initiated on the boundary line from a preformed condensate droplet of a size close to the stationary point in the free energy profile will experience negligible driving forces to either grow or dissipate. 40 At large system sizes, this effect manifests itself as a very slow evolution of the droplet toward the equilibrium state, corresponding to a homogeneous phase with density ρ d°. Instead, at small system sizes, where far fewer chains are present, the shallow free energy gradient results in large fluctuations in the droplet size. This behavior is confirmed by simulations performed in regimes of volume and peptide density closely approximating these conditions (see Table 2).
Applying a general thermodynamics framework for interpreting nucleation in finite-sized system is a powerful approach for correctly investigating the formation of biomolecular condensates with simulations of reasonable size. This strategy yields consistent estimates of thermodynamic properties of protein condensates, such as the equilibrium density of the dilute and condensed phases and the surface tension. In turn, information about the nucleation kinetics, such as the critical nucleus size and the free energy nucleation barrier, can be obtained.
Here we have used this method to quantitatively characterize the LLPS of two model phase-separating systems, NDDX4 and FUS-LC, by using CG simulations based on a minimal, one-bead-per-residue description. Nevertheless, the same approach can be easily extended to higher-resolution models, 41,42 which provide a more accurate conformational description and may help in rationalizing complex nucleation mechanisms. 3 Furthermore, the effect of system size on these thermodynamic and kinetic properties can be clearly demonstrated and quantified. This is particularly relevant in the context of biological systems, where phase separations take place in micrometer-scale isolated cellular compartments, as well as in the rational development of technological applications of LLPS for material synthesis in micro/nanofluidic devices. 40,43  Initial condition: the final configuration of the system in Table 1 in the increased simulation box or randomly distributed homogeneous one-phase initial state. The equilibration time is approximately 1 μs. Jupyter notebooks used in the data analyses are available for download at https://github.com/mme-ucl/confined_LLPS.

* sı Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpclett.2c03512. Additional details and figures on the clustering algorithms used to identify the number of chains in the dense phase, calculation of the droplet density profiles, Markov-state models for estimating the exchange kinetics, and force-field parameters (PDF) Transparent Peer Review report available (PDF)